support for CUDA aware MPI run, half-precision floating point (fp16) and reduce_scatter communication #1
+133
−8
We went looking everywhere, but couldn’t find those commits.
Sometimes commits can disappear after a force-push. Head back to the latest changes here.